Overview
Brought to you by YData
Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 2964624 |
| Missing cells | 700810 |
| Missing cells (%) | 1.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 395.8 MiB |
| Average record size in memory | 140.0 B |
Variable types
| Categorical | 4 |
|---|---|
| DateTime | 2 |
| Numeric | 12 |
| Boolean | 1 |
Variable descriptions
| object_ID | {'metrics': ['missing']} |
|---|---|
| territory_ID | {'metrics': ['missing', 'type']} |
| is_person_account | {'metrics': ['missing', 'infinite']} |
| account_primary_country_code | {'metrics': ['missing', 'distinct']} |
| account_country_code | {'metrics': ['missing', 'unique']} |
| territory_name | {'metrics': ['missing', 'mean']} |
store_and_fwd_flag is highly imbalanced (96.2%) | Imbalance |
payment_type is highly imbalanced (55.4%) | Imbalance |
improvement_surcharge is highly imbalanced (95.7%) | Imbalance |
Airport_fee is highly imbalanced (72.9%) | Imbalance |
passenger_count has 140162 (4.7%) missing values | Missing |
RatecodeID has 140162 (4.7%) missing values | Missing |
store_and_fwd_flag has 140162 (4.7%) missing values | Missing |
congestion_surcharge has 140162 (4.7%) missing values | Missing |
Airport_fee has 140162 (4.7%) missing values | Missing |
trip_distance is highly skewed (γ1 = 1001.887885) | Skewed |
passenger_count has 31465 (1.1%) zeros | Zeros |
trip_distance has 60371 (2.0%) zeros | Zeros |
extra has 1290548 (43.5%) zeros | Zeros |
mta_tax has 29707 (1.0%) zeros | Zeros |
tip_amount has 710292 (24.0%) zeros | Zeros |
tolls_amount has 2753809 (92.9%) zeros | Zeros |
congestion_surcharge has 217877 (7.3%) zeros | Zeros |
Reproduction
| Analysis started | 2025-03-01 04:59:38.056950 |
|---|---|
| Analysis finished | 2025-03-01 05:01:44.469952 |
| Duration | 2 minutes and 6.41 seconds |
| Software version | ydata-profiling vv4.12.2 |
| Download configuration | config.json |
Variables
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 2234632 | |
| 1 | 729732 | 24.6% |
| 6 | 260 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2 | 2234632 | |
| 1 | 729732 | 24.6% |
| 6 | 260 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 2234632 | |
| 1 | 729732 | 24.6% |
| 6 | 260 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2964624 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 2 | 2234632 | |
| 1 | 729732 | 24.6% |
| 6 | 260 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2964624 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 2 | 2234632 | |
| 1 | 729732 | 24.6% |
| 6 | 260 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2964624 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 2 | 2234632 | |
| 1 | 729732 | 24.6% |
| 6 | 260 | < 0.1% |
| Distinct | 1575706 |
|---|---|
| Distinct (%) | 53.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.6 MiB |
| Minimum | 2002-12-31 22:59:39 |
|---|---|
| Maximum | 2024-02-01 00:01:15 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
| Distinct | 1574780 |
|---|---|
| Distinct (%) | 53.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.6 MiB |
| Minimum | 2002-12-31 23:05:41 |
|---|---|
| Maximum | 2024-02-02 13:56:52 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
passenger_count
Real number (ℝ)
Missing  Zeros 
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 140162 |
| Missing (%) | 4.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.3392809 |
| Minimum | 0 |
|---|---|
| Maximum | 9 |
| Zeros | 31465 |
| Zeros (%) | 1.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.85028169 |
|---|---|
| Coefficient of variation (CV) | 0.63487928 |
| Kurtosis | 10.671029 |
| Mean | 1.3392809 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.0389422 |
| Sum | 3782748 |
| Variance | 0.72297896 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 2188739 | |
| 2 | 405103 | 13.7% |
| 3 | 91262 | 3.1% |
| 4 | 51974 | 1.8% |
| 5 | 33506 | 1.1% |
| 0 | 31465 | 1.1% |
| 6 | 22353 | 0.8% |
| 8 | 51 | < 0.1% |
| 7 | 8 | < 0.1% |
| 9 | 1 | < 0.1% |
| (Missing) | 140162 | 4.7% |
| Value | Count | Frequency (%) |
| 0 | 31465 | 1.1% |
| 1 | 2188739 | |
| 2 | 405103 | 13.7% |
| 3 | 91262 | 3.1% |
| 4 | 51974 | 1.8% |
| 5 | 33506 | 1.1% |
| 6 | 22353 | 0.8% |
| 7 | 8 | < 0.1% |
| 8 | 51 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 9 | 1 | < 0.1% |
| 8 | 51 | < 0.1% |
| 7 | 8 | < 0.1% |
| 6 | 22353 | 0.8% |
| 5 | 33506 | 1.1% |
| 4 | 51974 | 1.8% |
| 3 | 91262 | 3.1% |
| 2 | 405103 | 13.7% |
| 1 | 2188739 | |
| 0 | 31465 | 1.1% |
trip_distance
Real number (ℝ)
Skewed  Zeros 
| Distinct | 4489 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.6521692 |
| Minimum | 0 |
|---|---|
| Maximum | 312722.3 |
| Zeros | 60371 |
| Zeros (%) | 2.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.43 |
| Q1 | 1 |
| median | 1.68 |
| Q3 | 3.11 |
| 95-th percentile | 13.69 |
| Maximum | 312722.3 |
| Range | 312722.3 |
| Interquartile range (IQR) | 2.11 |
Descriptive statistics
| Standard deviation | 225.46257 |
|---|---|
| Coefficient of variation (CV) | 61.73388 |
| Kurtosis | 1281274.3 |
| Mean | 3.6521692 |
| Median Absolute Deviation (MAD) | 0.86 |
| Skewness | 1001.8879 |
| Sum | 10827308 |
| Variance | 50833.372 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 60371 | 2.0% |
| 0.9 | 40455 | 1.4% |
| 1 | 40192 | 1.4% |
| 0.8 | 39964 | 1.3% |
| 1.1 | 38662 | 1.3% |
| 0.7 | 37603 | 1.3% |
| 1.2 | 36917 | 1.2% |
| 1.3 | 35131 | 1.2% |
| 1.4 | 33111 | 1.1% |
| 0.6 | 32791 | 1.1% |
| Other values (4479) | 2569427 |
| Value | Count | Frequency (%) |
| 0 | 60371 | |
| 0.01 | 2396 | 0.1% |
| 0.02 | 1652 | 0.1% |
| 0.03 | 1270 | < 0.1% |
| 0.04 | 1012 | < 0.1% |
| 0.05 | 766 | < 0.1% |
| 0.06 | 662 | < 0.1% |
| 0.07 | 576 | < 0.1% |
| 0.08 | 575 | < 0.1% |
| 0.09 | 459 | < 0.1% |
| Value | Count | Frequency (%) |
| 312722.3 | 1 | |
| 97793.92 | 1 | |
| 82015.45 | 1 | |
| 72975.97 | 1 | |
| 71752.26 | 1 | |
| 59282.45 | 1 | |
| 59076.43 | 1 | |
| 58298.51 | 1 | |
| 51619.36 | 1 | |
| 44018.64 | 1 |
RatecodeID
Real number (ℝ)
Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 140162 |
| Missing (%) | 4.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.0693594 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.6 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 9.823219 |
|---|---|
| Coefficient of variation (CV) | 4.7469854 |
| Kurtosis | 93.209258 |
| Mean | 2.0693594 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 9.7490649 |
| Sum | 5844827 |
| Variance | 96.495631 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 2663350 | |
| 2 | 98713 | 3.3% |
| 99 | 28663 | 1.0% |
| 5 | 19410 | 0.7% |
| 3 | 7954 | 0.3% |
| 4 | 6365 | 0.2% |
| 6 | 7 | < 0.1% |
| (Missing) | 140162 | 4.7% |
| Value | Count | Frequency (%) |
| 1 | 2663350 | |
| 2 | 98713 | 3.3% |
| 3 | 7954 | 0.3% |
| 4 | 6365 | 0.2% |
| 5 | 19410 | 0.7% |
| 6 | 7 | < 0.1% |
| 99 | 28663 | 1.0% |
| Value | Count | Frequency (%) |
| 99 | 28663 | 1.0% |
| 6 | 7 | < 0.1% |
| 5 | 19410 | 0.7% |
| 4 | 6365 | 0.2% |
| 3 | 7954 | 0.3% |
| 2 | 98713 | 3.3% |
| 1 | 2663350 |
store_and_fwd_flag
Boolean
Imbalance  Missing 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 140162 |
| Missing (%) | 4.7% |
| Memory size | 5.7 MiB |
| False | |
|---|---|
| True | 11336 |
| (Missing) | 140162 |
| Value | Count | Frequency (%) |
| False | 2813126 | |
| True | 11336 | 0.4% |
| (Missing) | 140162 | 4.7% |
PULocationID
Real number (ℝ)
| Distinct | 260 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 166.01788 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.3 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 48 |
| Q1 | 132 |
| median | 162 |
| Q3 | 234 |
| 95-th percentile | 249 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 102 |
Descriptive statistics
| Standard deviation | 63.623914 |
|---|---|
| Coefficient of variation (CV) | 0.38323531 |
| Kurtosis | -0.82977668 |
| Mean | 166.01788 |
| Median Absolute Deviation (MAD) | 62 |
| Skewness | -0.27225227 |
| Sum | 4.921806 × 108 |
| Variance | 4048.0025 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 132 | 145240 | 4.9% |
| 161 | 143471 | 4.8% |
| 237 | 142708 | 4.8% |
| 236 | 136465 | 4.6% |
| 162 | 106717 | 3.6% |
| 230 | 106324 | 3.6% |
| 186 | 104523 | 3.5% |
| 142 | 104080 | 3.5% |
| 138 | 89533 | 3.0% |
| 239 | 88474 | 3.0% |
| Other values (250) | 1797089 |
| Value | Count | Frequency (%) |
| 1 | 295 | < 0.1% |
| 2 | 3 | < 0.1% |
| 3 | 105 | < 0.1% |
| 4 | 3568 | |
| 6 | 21 | < 0.1% |
| 7 | 1811 | |
| 8 | 11 | < 0.1% |
| 9 | 57 | < 0.1% |
| 10 | 999 | < 0.1% |
| 11 | 58 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 1658 | 0.1% |
| 264 | 10360 | 0.3% |
| 263 | 59797 | |
| 262 | 42801 | |
| 261 | 12893 | 0.4% |
| 260 | 813 | < 0.1% |
| 259 | 119 | < 0.1% |
| 258 | 185 | < 0.1% |
| 257 | 78 | < 0.1% |
| 256 | 912 | < 0.1% |
DOLocationID
Real number (ℝ)
| Distinct | 261 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 165.11671 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.3 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 43 |
| Q1 | 114 |
| median | 162 |
| Q3 | 234 |
| 95-th percentile | 261 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 120 |
Descriptive statistics
| Standard deviation | 69.31535 |
|---|---|
| Coefficient of variation (CV) | 0.41979609 |
| Kurtosis | -0.90603826 |
| Mean | 165.11671 |
| Median Absolute Deviation (MAD) | 68 |
| Skewness | -0.37551746 |
| Sum | 4.8950897 × 108 |
| Variance | 4804.6177 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 236 | 142044 | 4.8% |
| 237 | 130249 | 4.4% |
| 161 | 111942 | 3.8% |
| 230 | 90603 | 3.1% |
| 142 | 89673 | 3.0% |
| 239 | 89105 | 3.0% |
| 170 | 86733 | 2.9% |
| 162 | 85238 | 2.9% |
| 141 | 83562 | 2.8% |
| 68 | 74517 | 2.5% |
| Other values (251) | 1980958 |
| Value | Count | Frequency (%) |
| 1 | 7176 | |
| 2 | 4 | < 0.1% |
| 3 | 247 | < 0.1% |
| 4 | 11536 | |
| 5 | 9 | < 0.1% |
| 6 | 62 | < 0.1% |
| 7 | 7738 | |
| 8 | 45 | < 0.1% |
| 9 | 284 | < 0.1% |
| 10 | 2665 | 0.1% |
| Value | Count | Frequency (%) |
| 265 | 11967 | 0.4% |
| 264 | 16116 | 0.5% |
| 263 | 64989 | |
| 262 | 48328 | |
| 261 | 12617 | 0.4% |
| 260 | 2200 | 0.1% |
| 259 | 349 | < 0.1% |
| 258 | 732 | < 0.1% |
| 257 | 1096 | < 0.1% |
| 256 | 5465 | 0.2% |
payment_type
Categorical
Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.6 MiB |
| 1 | |
|---|---|
| 2 | |
| 0 | 140162 |
| 4 | 46628 |
| 3 | 19597 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 2319046 | |
| 2 | 439191 | 14.8% |
| 0 | 140162 | 4.7% |
| 4 | 46628 | 1.6% |
| 3 | 19597 | 0.7% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 2319046 | |
| 2 | 439191 | 14.8% |
| 0 | 140162 | 4.7% |
| 4 | 46628 | 1.6% |
| 3 | 19597 | 0.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 2319046 | |
| 2 | 439191 | 14.8% |
| 0 | 140162 | 4.7% |
| 4 | 46628 | 1.6% |
| 3 | 19597 | 0.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2964624 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 2319046 | |
| 2 | 439191 | 14.8% |
| 0 | 140162 | 4.7% |
| 4 | 46628 | 1.6% |
| 3 | 19597 | 0.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2964624 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 2319046 | |
| 2 | 439191 | 14.8% |
| 0 | 140162 | 4.7% |
| 4 | 46628 | 1.6% |
| 3 | 19597 | 0.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2964624 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 2319046 | |
| 2 | 439191 | 14.8% |
| 0 | 140162 | 4.7% |
| 4 | 46628 | 1.6% |
| 3 | 19597 | 0.7% |
fare_amount
Real number (ℝ)
| Distinct | 8970 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.175062 |
| Minimum | -899 |
|---|---|
| Maximum | 5000 |
| Zeros | 893 |
| Zeros (%) | < 0.1% |
| Negative | 37448 |
| Negative (%) | 1.3% |
| Memory size | 22.6 MiB |
Quantile statistics
| Minimum | -899 |
|---|---|
| 5-th percentile | 5.1 |
| Q1 | 8.6 |
| median | 12.8 |
| Q3 | 20.5 |
| 95-th percentile | 61.8 |
| Maximum | 5000 |
| Range | 5899 |
| Interquartile range (IQR) | 11.9 |
Descriptive statistics
| Standard deviation | 18.949548 |
|---|---|
| Coefficient of variation (CV) | 1.0426126 |
| Kurtosis | 3653.4671 |
| Mean | 18.175062 |
| Median Absolute Deviation (MAD) | 4.9 |
| Skewness | 18.150372 |
| Sum | 53882225 |
| Variance | 359.08536 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8.6 | 140879 | 4.8% |
| 7.9 | 139456 | 4.7% |
| 9.3 | 138462 | 4.7% |
| 10 | 135501 | 4.6% |
| 7.2 | 133066 | 4.5% |
| 10.7 | 127631 | 4.3% |
| 11.4 | 120337 | 4.1% |
| 6.5 | 118249 | 4.0% |
| 12.1 | 112320 | 3.8% |
| 12.8 | 103324 | 3.5% |
| Other values (8960) | 1695399 |
| Value | Count | Frequency (%) |
| -899 | 1 | |
| -800 | 2 | |
| -744.3 | 1 | |
| -709 | 1 | |
| -700 | 1 | |
| -670 | 1 | |
| -669.4 | 1 | |
| -650 | 1 | |
| -607.8 | 1 | |
| -600 | 1 |
| Value | Count | Frequency (%) |
| 5000 | 2 | |
| 2500 | 3 | |
| 2221.3 | 1 | < 0.1% |
| 1616.5 | 1 | < 0.1% |
| 1000 | 1 | < 0.1% |
| 912.3 | 1 | < 0.1% |
| 899 | 1 | < 0.1% |
| 820 | 1 | < 0.1% |
| 800 | 2 | |
| 761.1 | 1 | < 0.1% |
extra
Real number (ℝ)
Zeros 
| Distinct | 48 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.4515984 |
| Minimum | -7.5 |
|---|---|
| Maximum | 14.25 |
| Zeros | 1290548 |
| Zeros (%) | 43.5% |
| Negative | 17548 |
| Negative (%) | 0.6% |
| Memory size | 22.6 MiB |
Quantile statistics
| Minimum | -7.5 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2.5 |
| 95-th percentile | 5 |
| Maximum | 14.25 |
| Range | 21.75 |
| Interquartile range (IQR) | 2.5 |
Descriptive statistics
| Standard deviation | 1.8041025 |
|---|---|
| Coefficient of variation (CV) | 1.2428385 |
| Kurtosis | 2.7855932 |
| Mean | 1.4515984 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.3976617 |
| Sum | 4303443.6 |
| Variance | 3.2547857 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1290548 | |
| 2.5 | 705767 | |
| 1 | 526527 | |
| 5 | 192426 | 6.5% |
| 3.5 | 143201 | 4.8% |
| 6 | 23477 | 0.8% |
| 7.5 | 22407 | 0.8% |
| 9.25 | 10506 | 0.4% |
| -1 | 10287 | 0.3% |
| 4.25 | 9767 | 0.3% |
| Other values (38) | 29711 | 1.0% |
| Value | Count | Frequency (%) |
| -7.5 | 227 | < 0.1% |
| -6 | 319 | < 0.1% |
| -5 | 1146 | < 0.1% |
| -3.5 | 1 | < 0.1% |
| -2.5 | 5564 | 0.2% |
| -1.5 | 3 | < 0.1% |
| -1 | 10287 | 0.3% |
| -0.04 | 1 | < 0.1% |
| 0 | 1290548 | |
| 0.01 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 14.25 | 2 | < 0.1% |
| 12.5 | 1 | < 0.1% |
| 11.75 | 2440 | 0.1% |
| 10.25 | 2911 | 0.1% |
| 10 | 642 | < 0.1% |
| 9.95 | 1 | < 0.1% |
| 9.25 | 10506 | |
| 8.5 | 394 | < 0.1% |
| 8.2 | 2 | < 0.1% |
| 7.75 | 2373 | 0.1% |
mta_tax
Real number (ℝ)
Zeros 
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.48338231 |
| Minimum | -0.5 |
|---|---|
| Maximum | 4 |
| Zeros | 29707 |
| Zeros (%) | 1.0% |
| Negative | 34434 |
| Negative (%) | 1.2% |
| Memory size | 22.6 MiB |
Quantile statistics
| Minimum | -0.5 |
|---|---|
| 5-th percentile | 0.5 |
| Q1 | 0.5 |
| median | 0.5 |
| Q3 | 0.5 |
| 95-th percentile | 0.5 |
| Maximum | 4 |
| Range | 4.5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.11776003 |
|---|---|
| Coefficient of variation (CV) | 0.24361676 |
| Kurtosis | 57.743719 |
| Mean | 0.48338231 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -7.4054623 |
| Sum | 1433046.8 |
| Variance | 0.013867425 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.5 | 2900474 | |
| -0.5 | 34434 | 1.2% |
| 0 | 29707 | 1.0% |
| 4 | 5 | < 0.1% |
| 1.6 | 1 | < 0.1% |
| 0.8 | 1 | < 0.1% |
| 1.4 | 1 | < 0.1% |
| 3 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| -0.5 | 34434 | 1.2% |
| 0 | 29707 | 1.0% |
| 0.5 | 2900474 | |
| 0.8 | 1 | < 0.1% |
| 1.4 | 1 | < 0.1% |
| 1.6 | 1 | < 0.1% |
| 3 | 1 | < 0.1% |
| 4 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 4 | 5 | < 0.1% |
| 3 | 1 | < 0.1% |
| 1.6 | 1 | < 0.1% |
| 1.4 | 1 | < 0.1% |
| 0.8 | 1 | < 0.1% |
| 0.5 | 2900474 | |
| 0 | 29707 | 1.0% |
| -0.5 | 34434 | 1.2% |
tip_amount
Real number (ℝ)
Zeros 
| Distinct | 4192 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.33587 |
| Minimum | -80 |
|---|---|
| Maximum | 428 |
| Zeros | 710292 |
| Zeros (%) | 24.0% |
| Negative | 102 |
| Negative (%) | < 0.1% |
| Memory size | 22.6 MiB |
Quantile statistics
| Minimum | -80 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2.7 |
| Q3 | 4.12 |
| 95-th percentile | 11.2 |
| Maximum | 428 |
| Range | 508 |
| Interquartile range (IQR) | 3.12 |
Descriptive statistics
| Standard deviation | 3.8965506 |
|---|---|
| Coefficient of variation (CV) | 1.1680763 |
| Kurtosis | 173.6392 |
| Mean | 3.33587 |
| Median Absolute Deviation (MAD) | 1.7 |
| Skewness | 5.0541375 |
| Sum | 9889600.3 |
| Variance | 15.183107 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 710292 | |
| 2 | 145946 | 4.9% |
| 1 | 113565 | 3.8% |
| 3 | 75150 | 2.5% |
| 5 | 39511 | 1.3% |
| 2.8 | 39085 | 1.3% |
| 3.5 | 33831 | 1.1% |
| 2.1 | 32854 | 1.1% |
| 4 | 31961 | 1.1% |
| 1.5 | 31215 | 1.1% |
| Other values (4182) | 1711214 |
| Value | Count | Frequency (%) |
| -80 | 1 | < 0.1% |
| -66.02 | 1 | < 0.1% |
| -65.1 | 1 | < 0.1% |
| -52 | 1 | < 0.1% |
| -37.58 | 1 | < 0.1% |
| -33 | 1 | < 0.1% |
| -22.24 | 1 | < 0.1% |
| -22 | 2 | |
| -17.59 | 1 | < 0.1% |
| -16.19 | 3 |
| Value | Count | Frequency (%) |
| 428 | 1 | |
| 422.7 | 1 | |
| 303 | 1 | |
| 300 | 1 | |
| 280 | 1 | |
| 250 | 1 | |
| 220.88 | 1 | |
| 202 | 2 | |
| 175.17 | 1 | |
| 150 | 1 |
tolls_amount
Real number (ℝ)
Zeros 
| Distinct | 1127 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5270212 |
| Minimum | -80 |
|---|---|
| Maximum | 115.92 |
| Zeros | 2753809 |
| Zeros (%) | 92.9% |
| Negative | 2035 |
| Negative (%) | 0.1% |
| Memory size | 22.6 MiB |
Quantile statistics
| Minimum | -80 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 6.94 |
| Maximum | 115.92 |
| Range | 195.92 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.1283097 |
|---|---|
| Coefficient of variation (CV) | 4.0383758 |
| Kurtosis | 72.868008 |
| Mean | 0.5270212 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.4859052 |
| Sum | 1562419.7 |
| Variance | 4.5297021 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2753809 | |
| 6.94 | 191910 | 6.5% |
| 13.38 | 2031 | 0.1% |
| -6.94 | 1685 | 0.1% |
| 3.18 | 1417 | < 0.1% |
| 15.38 | 1378 | < 0.1% |
| 13.88 | 1144 | < 0.1% |
| 12.75 | 891 | < 0.1% |
| 14.75 | 574 | < 0.1% |
| 20.32 | 360 | < 0.1% |
| Other values (1117) | 9425 | 0.3% |
| Value | Count | Frequency (%) |
| -80 | 1 | |
| -60 | 1 | |
| -56.64 | 1 | |
| -55.34 | 1 | |
| -54.02 | 1 | |
| -52.57 | 1 | |
| -50 | 2 | |
| -49.26 | 1 | |
| -48.75 | 1 | |
| -47.26 | 1 |
| Value | Count | Frequency (%) |
| 115.92 | 1 | < 0.1% |
| 101.69 | 1 | < 0.1% |
| 99 | 1 | < 0.1% |
| 95.46 | 1 | < 0.1% |
| 90 | 1 | < 0.1% |
| 87 | 1 | < 0.1% |
| 85 | 2 | < 0.1% |
| 83 | 2 | < 0.1% |
| 82 | 1 | < 0.1% |
| 81 | 6 |
improvement_surcharge
Categorical
Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.6 MiB |
| 1.0 | |
|---|---|
| -1.0 | 35500 |
| 0.0 | 838 |
| 0.3 | 574 |
| -0.3 | 2 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0119752 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 2927710 | |
| -1.0 | 35500 | 1.2% |
| 0.0 | 838 | < 0.1% |
| 0.3 | 574 | < 0.1% |
| -0.3 | 2 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1.0 | 2963210 | |
| 0.0 | 838 | < 0.1% |
| 0.3 | 576 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 2965462 | |
| . | 2964624 | |
| 1 | 2963210 | |
| - | 35502 | 0.4% |
| 3 | 576 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 8929374 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 2965462 | |
| . | 2964624 | |
| 1 | 2963210 | |
| - | 35502 | 0.4% |
| 3 | 576 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 8929374 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 2965462 | |
| . | 2964624 | |
| 1 | 2963210 | |
| - | 35502 | 0.4% |
| 3 | 576 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 8929374 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 2965462 | |
| . | 2964624 | |
| 1 | 2963210 | |
| - | 35502 | 0.4% |
| 3 | 576 | < 0.1% |
total_amount
Real number (ℝ)
| Distinct | 19241 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 26.801505 |
| Minimum | -900 |
|---|---|
| Maximum | 5000 |
| Zeros | 416 |
| Zeros (%) | < 0.1% |
| Negative | 35504 |
| Negative (%) | 1.2% |
| Memory size | 22.6 MiB |
Quantile statistics
| Minimum | -900 |
|---|---|
| 5-th percentile | 10.87 |
| Q1 | 15.38 |
| median | 20.1 |
| Q3 | 28.56 |
| 95-th percentile | 80.19 |
| Maximum | 5000 |
| Range | 5900 |
| Interquartile range (IQR) | 13.18 |
Descriptive statistics
| Standard deviation | 23.385577 |
|---|---|
| Coefficient of variation (CV) | 0.87254718 |
| Kurtosis | 1570.4795 |
| Mean | 26.801505 |
| Median Absolute Deviation (MAD) | 5.8 |
| Skewness | 10.68236 |
| Sum | 79456384 |
| Variance | 546.88523 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 16.8 | 45432 | 1.5% |
| 12.6 | 43275 | 1.5% |
| 21 | 36556 | 1.2% |
| 15.12 | 26687 | 0.9% |
| 15.96 | 26396 | 0.9% |
| 14.28 | 25970 | 0.9% |
| 17.64 | 24525 | 0.8% |
| 18.48 | 24349 | 0.8% |
| 13.44 | 23938 | 0.8% |
| 19.32 | 23514 | 0.8% |
| Other values (19231) | 2663982 |
| Value | Count | Frequency (%) |
| -900 | 1 | |
| -801 | 2 | |
| -753.74 | 1 | |
| -710 | 1 | |
| -695.75 | 1 | |
| -671 | 1 | |
| -652.75 | 1 | |
| -637.87 | 1 | |
| -591 | 1 | |
| -578.96 | 1 |
| Value | Count | Frequency (%) |
| 5000 | 2 | |
| 2500 | 3 | |
| 2225.3 | 1 | < 0.1% |
| 1617.5 | 1 | < 0.1% |
| 1000 | 1 | < 0.1% |
| 940.93 | 1 | < 0.1% |
| 900 | 1 | < 0.1% |
| 821 | 1 | < 0.1% |
| 801 | 2 | |
| 775.48 | 1 | < 0.1% |
congestion_surcharge
Real number (ℝ)
Missing  Zeros 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 140162 |
| Missing (%) | 4.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.2561221 |
| Minimum | -2.5 |
|---|---|
| Maximum | 2.5 |
| Zeros | 217877 |
| Zeros (%) | 7.3% |
| Negative | 28825 |
| Negative (%) | 1.0% |
| Memory size | 22.6 MiB |
Quantile statistics
| Minimum | -2.5 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2.5 |
| median | 2.5 |
| Q3 | 2.5 |
| 95-th percentile | 2.5 |
| Maximum | 2.5 |
| Range | 5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.82327467 |
|---|---|
| Coefficient of variation (CV) | 0.36490697 |
| Kurtosis | 12.724867 |
| Mean | 2.2561221 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -3.5314914 |
| Sum | 6372331 |
| Variance | 0.67778118 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2.5 | 2577755 | |
| 0 | 217877 | 7.3% |
| -2.5 | 28824 | 1.0% |
| 0.75 | 3 | < 0.1% |
| 1 | 2 | < 0.1% |
| -0.75 | 1 | < 0.1% |
| (Missing) | 140162 | 4.7% |
| Value | Count | Frequency (%) |
| -2.5 | 28824 | 1.0% |
| -0.75 | 1 | < 0.1% |
| 0 | 217877 | 7.3% |
| 0.75 | 3 | < 0.1% |
| 1 | 2 | < 0.1% |
| 2.5 | 2577755 |
| Value | Count | Frequency (%) |
| 2.5 | 2577755 | |
| 1 | 2 | < 0.1% |
| 0.75 | 3 | < 0.1% |
| 0 | 217877 | 7.3% |
| -0.75 | 1 | < 0.1% |
| -2.5 | 28824 | 1.0% |
Airport_fee
Categorical
Imbalance  Missing 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 140162 |
| Missing (%) | 4.7% |
| Memory size | 22.6 MiB |
| 0.0 | |
|---|---|
| 1.75 | 232752 |
| -1.75 | 4921 |
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.0858903 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 2586789 | |
| 1.75 | 232752 | 7.9% |
| -1.75 | 4921 | 0.2% |
| (Missing) | 140162 | 4.7% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.0 | 2586789 | |
| 1.75 | 237673 | 8.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 5173578 | |
| . | 2824462 | |
| 1 | 237673 | 2.7% |
| 7 | 237673 | 2.7% |
| 5 | 237673 | 2.7% |
| - | 4921 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 8715980 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 5173578 | |
| . | 2824462 | |
| 1 | 237673 | 2.7% |
| 7 | 237673 | 2.7% |
| 5 | 237673 | 2.7% |
| - | 4921 | 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 8715980 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 5173578 | |
| . | 2824462 | |
| 1 | 237673 | 2.7% |
| 7 | 237673 | 2.7% |
| 5 | 237673 | 2.7% |
| - | 4921 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 8715980 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 5173578 | |
| . | 2824462 | |
| 1 | 237673 | 2.7% |
| 7 | 237673 | 2.7% |
| 5 | 237673 | 2.7% |
| - | 4921 | 0.1% |